A starting guide for coding qualitative data manually and automatically. Learn to build a coding frame and find significant themes in your data!
Coding qualitative data is essential for transforming unstructured feedback into actionable insights. Whether you're analyzing customer sentiment, product reviews, or survey responses, coding enables you to systematically categorize themes, patterns, and trends in textual data.
In this guide, you’ll learn:
By the end, you’ll be equipped with practical methods to streamline data processing, improve reporting accuracy, and extract meaningful insights from open-ended responses.
Conducting qualitative research, particularly through coding, is a crucial step in ensuring the validity and reliability of findings. Coding is the process of labeling and organizing your qualitative data to identify different themes and the relationships between them.
When coding customer feedback, you assign labels to words or phrases that represent important (and recurring) themes in each response. These labels can be words, phrases, or numbers; we recommend using words or short phrases, as they’re easier to remember, skim, and organize.
But coding is more than just categorization—it is an iterative and reflective process. As researchers engage with the data, they refine their codes, uncovering deeper meanings in the text. As Pual Mihas mentions in his book:
“One of the strengths of coding is that it sustains this period of wonder, of checking and rechecking, naming and renaming, and ‘diving in and stepping back. Coding creates a conceptual foreground against the larger canvas of copious data.”
Coding qualitative research to find common themes and concepts is part of thematic analysis, which involves extracting themes from text by analyzing the word and sentence structure.
Within the context of customer feedback, it’s important to understand the many different types of qualitative feedback a business can collect, such as open-ended surveys, social media comments, and reviews. These qualitative insights provide deeper context and help businesses uncover trends that may not be visible through numerical data alone.
Qualitative data analysis, including coding and analyzing qualitative data, is essential for understanding the depth and complexity of qualitative data. It is the process of examining and interpreting qualitative data to understand what it represents.
Qualitative analysis is crucial as it involves various methods such as thematic analysis, emotion coding, inductive and deductive thematic analysis, and content analysis. These methods help in coding the data, which is vital for the validity of the analysis.
Qualitative data is defined as any non-numerical and unstructured data; when looking at customer feedback, qualitative data usually refers to any verbatim or text-based feedback such as reviews, open-ended responses in surveys, complaints, chat messages, customer interviews, case notes or social media posts.
For example, NPS metric can be strictly quantitative, but when you ask customers why they gave you a rating a score, you will need qualitative data analysis methods in place to understand the comments that customers leave alongside numerical responses.
This refers to the uncovering of themes, by analyzing the patterns and relationships in a set of qualitative data. A theme emerges or is built when related findings appear to be meaningful and there are multiple occurrences. Thematic analysis can be used by anyone to transform and organize open-ended responses, analyze online reviews, and other qualitative data into significant themes. Thematic analysis coding is a method that aids in categorizing data extracts and deriving themes and patterns for qualitative analysis, facilitating the identification of themes revolving around a particular concept or phenomenon in the social sciences.
This refers to the categorization, tagging and thematic analysis of qualitative data. Essentially content analysis is a quantification of themes, by counting the occurrence of concepts, topics or themes. Content analysis can involve combining the categories in qualitative data with quantitative data, such as behavioral data or demographic data, for deeper insights.
Some qualitative data, such as interviews or field notes may contain a story on how someone experienced something. For example, the process of choosing a product, using it, evaluating its quality and decision to buy or not buy this product next time. The goal of narrative analysis is to turn the individual narratives into data that can be coded. This is then analyzed to understand how events or experiences had an impact on the people involved. Process coding is particularly useful in narrative analysis for identifying specific phases, sequences, and movements within the stories, capturing actions within qualitative data by using codes that typically represent gerunds ending in 'ing', providing a dynamic account of events within the data.
This refers to analysis of what people say in social and cultural context. The goal of discourse analysis is to understand user or customer behavior by uncovering their beliefs, interests and agendas. These are reflected in the way they express their opinions, preferences and experiences. Structural coding is a method that can be applied here, organizing data based on predetermined structures, such as the structure of discourse elements, to enhance the analysis of discourse. It’s particularly useful when your focus is on building or strengthening a brand, by examining how they use metaphors and rhetorical devices.
When performing qualitative data analysis, it is useful to have a framework to organize the buckets of meaning. A taxonomy or code frame (a hierarchical set of themes used in coding qualitative data) is an example of the result. Don't fall into the trap of starting with a framework to make it faster to organize your data. You should look at how themes relate to each other by analyzing the data and consistently check that you can validate that themes are related to each other.
This method of analysis starts by formulating a theory around a single data case. Therefore the theory is “grounded' in actual data. Then additional cases can be examined to see if they are relevant and can add to the original theory.
Coding qualitative data makes it easier to interpret customer feedback. Assigning codes to words and phrases in each response helps capture what the response is about which, in turn, helps you better analyze and summarize the results of the entire survey.
Researchers use coding and other qualitative data analysis processes to help them make data-driven decisions based on customer feedback. When you use coding to analyze your customer feedback, you can quantify the common themes in customer language. This makes it easier to accurately interpret and analyze customer satisfaction.
But manual coding can be time-consuming and challenging at scale. Many companies struggle to keep up with large volumes of customer feedback while maintaining accuracy. Atlassian, a leading software company, faced this exact challenge and found a scalable solution through automated qualitative analysis.
Atlassian, known for products like Jira and Trello, received an overwhelming volume of qualitative feedback across multiple channels. Their research and product teams spent six weeks manually categorizing qualitative data, which only accounted for a fraction of their total dataset. This slow process limited their ability to extract meaningful insights and make timely product improvements.
By integrating Thematic’s AI-driven qualitative data analytics, Atlassian automated its feedback categorization, significantly reducing analysis time while maintaining accuracy. Instead of spending weeks on manual coding, the team could now process insights in real time, identify key themes, and prioritize product updates based on qualitative trends. This transformation helped Atlassian maintain high user satisfaction and continuously improve its products.
Thematic coding, also called thematic analysis, is a type of qualitative data analysis that finds themes in text by analyzing the meaning of words and sentence structure.
When you use thematic coding to analyze customer feedback for example, you can learn which themes are most frequent in feedback. This helps you understand what drives customer satisfaction in an accurate, actionable way.
To learn more about how Thematic analysis software helps you automate the data coding process, check out this article.
Methods of coding qualitative data fall into three categories: automated coding and manual coding, and a blend of the two.
You can automate the coding of your qualitative data with thematic analysis software. Thematic analysis and qualitative data analysis software use artificial intelligence (AI), natural language processing (NLP) to code your qualitative data and break text up into themes.
The BEST thematic analysis software is autonomous, which means…
…all of which will save you time (and lots of unnecessary headaches) when analyzing your customer feedback.
Businesses are also seeing the benefit of using thematic analysis software. The capacity to aggregate data sources into a single source of analysis helps to break down data silos, unifying the analysis and insights across departments . This is now being referred to as Omni channel analysis or Unified Data Analytics.
Whether you have tons of customer reviews, support chat, customer service conversations (conversational analytics) or open-ended survey responses, thematic analysis software like Thematic bring every valuable insight to the surface, while saving you thousands of hours.
Advances in AI have made it possible to automate the analysis of qualitative data, in particular content and framework analysis. Tools like ChatGPT and CoPilot are widely used for coding qualitative data on a small scale. Specialized solutions for this task are often referred to as text analytics software such as Thematic.
There are many benefits of automation:
Automating most of the analysis became the preferred choice for users. But no matter what solution you use, human oversight is still required to verify the accuracy. For example, Thematic has a Themes Editor tool which makes this oversight easy. It has drag-and-drop functionality and it makes it easy to refine, validate, and rename themes as you get more data. By guiding the AI, you can ensure your results are always precise, easy to understand and perfectly aligned with your objectives.
Don't just take it from us. Here's what some of our customers have to say:
I'm a fan of Thematic's ability to save time and create heroes. It does an excellent job using a single view to break down the verbatims into themes displayed by volume, sentiment and impact on our beacon metric, often but not exclusively NPS.
It does a superlative job using GenAI in summarizing a theme or sub-theme down to a single paragraph making it clear what folks are trying to say. Peter K, Snr Research Manager.
Thematic is a very intuitive tool to use. It boasts a robust level of granularity, allowing the user to see the general breadth of verbatim themes, dig into the sub-themes, and further into the sentiment of the open text itself. Artem C, Sr Manager of Research. LinkedIn.
AI-powered software to transform qualitative data at scale through a thematic and content analysis.
For the rest of this post, we'll focus on manual coding. Different researchers have different processes, but manual coding usually looks something like this:
Let's break it down a little further…
Before you start qualitative data coding, you need to decide which codes you'll use.
Deductive coding means you start with a predefined set of codes, then assign those codes to the new qualitative data. These codes might come from previous research, or you might already know what themes you're interested in analyzing. Deductive coding is also called concept-driven coding.
For example, let's say you're conducting a survey on customer experience. You want to understand the problems that arise from long call wait times, so you choose to make “wait time” one of your codes before you start looking at the data.
The deductive approach can save time and help guarantee that your areas of interest are coded. But you also need to be careful of bias; when you start with predefined codes, you have a bias as to what the answers will be. Make sure you don't miss other important themes by focusing too hard on proving your own hypothesis.
Inductive coding, also called open coding, starts from scratch and creates codes based on the qualitative data itself. You don't have a set codebook; all codes arise directly from the survey responses.
Here's how inductive coding works:
If you add a new code, split an existing code into two, or change the description of a code, make sure to review how this change will affect the coding of all responses. Otherwise, the same responses at different points in the survey could end up with different codes.
Sounds like a lot of work, right? Inductive coding is an iterative process, which means it takes longer and is more thorough than deductive coding. A major advantage is that it gives you a more complete, unbiased look at the themes throughout your data.
In practice, most researchers use a blend of inductive and deductive approaches to coding.
For example, with Thematic, the AI inductively comes up with themes, while also framing the analysis so that it reflects how business decisions are made. At the end of the analysis, researchers use the Theme Editor to iterate or refine themes. Then, in the next wave of analysis, as new data comes in, the AI starts deductively with the theme taxonomy.
Once you create your codes, you need to put them into a coding frame. A coding frame represents the organizational structure of the themes in your research. There are two types of coding frames: flat and hierarchical.
A flat coding frame assigns the same level of specificity and importance to each code. While this might feel like an easier and faster method for manual coding, it can be difficult to organize and navigate the themes and concepts as you create more and more codes. It also makes it hard to figure out which themes are most important, which can slow down decision making.
Hierarchical frames help you organize codes based on how they relate to one another. For example, you can organize the codes based on your customers' feelings on a certain topic:
Hierarchical Coding Frame example
In this example:
Hierarchical framing supports a larger code frame and lets you organize codes based on organizational structure. It also allows for different levels of granularity in your coding.
Whether your code frames are hierarchical or flat, your code frames should be flexible. Manually analyzing survey data takes a lot of time and effort; make sure you can use your results in different contexts.
For example, if your survey asks customers about customer service, you might only use codes that capture answers about customer service. Then you realize that the same survey responses have a lot of comments about your company's products. To learn more about what people say about your products, you may have to code all of the responses from scratch! A flexible coding frame covers different topics and insights, which lets you reuse the results later on.
Now that you know the basics of coding your qualitative data, here are some tips on making the most of your qualitative research.
As you code more and more data, it can be hard to remember all of your codes off the top of your head. Tracking your codes in a codebook helps keep you organized throughout the data analysis process. Your codebook can be as simple as an Excel spreadsheet or word processor document. As you code new data, add new codes to your codebook and reorganize categories and themes as needed.
Make sure to track:
The code should be generic enough to apply to multiple comments, but specific enough to be useful in your analysis. For example, “Product” is a broad code that will cover a variety of responses — but it's also pretty vague. What about the product? On the other hand, “Product stops working after using it for 3 hours” is very specific and probably won't apply to many responses. “Poor product quality” or “short product lifespan” might be a happy medium.
Having similar codes is okay as long as they serve different purposes. “Customer service” and “Product” are different enough from one another, while “Customer service” and “Customer support” may have subtle differences but should likely be combined into one code.
Try to create codes that contrast with each other to track both the positive and negative elements of a topic separately. For example, “Useful product features” and “Unnecessary product features” would be two different codes to capture two different themes.
Let's look at the two extremes: There are as many codes as there are responses, or each code applies to every single response. In both cases, the coding exercise is pointless; you don't learn anything new about your data or your customers. To make your analysis as useful as possible, try to find a balance between having too many and too few codes.
Make sure to group responses with the same themes under the same code, even if they don't use the same exact wording. For example, a code such as “cleanliness” could cover responses including words and phrases like:
Having only a few codes and hierarchical framing makes it easier to group different words and phrases under one code. If you have too many codes, especially in a flat frame, your results can become ambiguous and themes can overlap. Manual coding also requires the coder to remember or be able to find all of the relevant codes; the more codes you have, the harder it is to find the ones you need, no matter how organized your codebook is.
Manually coding qualitative data means that the coder's cognitive biases can influence the coding process. For each study, make sure you have coding guidelines and training in place to keep coding reliable, consistent, and accurate.
One thing to watch out for is definitional drift, which occurs when the data at the beginning of the data set is coded differently than the material coded later. Check for definitional drift across the entire dataset and keep notes with descriptions of how the codes vary across the results.
If you have multiple coders working on one team, have them check one another's coding to help eliminate cognitive biases.
Cut through marketing noise and be sure you are asking the right questions in sales calls and demos. Our guide can save you time by helping you understand what you need for effective feedback analysis.
Download your free copy today!Here are 6 final takeaways for manually coding your qualitative data:
If you've made it this far, you'll likely be interested in our free guide: Best practices for analyzing open-ended questions.
The guide includes some of the topics covered in this article, and goes into some more niche details.
If your company is looking to automate your qualitative coding process, try Thematic!
If you're looking to trial multiple solutions, check out our free buyer's guide. It covers what to look for when trialing different feedback analytics solutions to ensure you get the depth of insights you need.
Happy coding!
To maintain consistency, establish a detailed codebook that defines each code, provides examples, and outlines when to apply them. Conduct intercoder reliability tests, where multiple researchers code the same data and compare results. If discrepancies arise, refine the coding guidelines. Regular team discussions, calibration exercises, and training sessions help align interpretations. Using qualitative data analysis software can also standardize coding. Thematic, for example, enables teams to collaboratively refine and validate codes, ensuring reliability across different datasets without manual inconsistencies.
One common mistake is overcoding, where too many specific codes are created, making analysis overly complex. Conversely, undercoding (using broad categories) can oversimplify findings and obscure key insights. Inconsistent coding occurs when multiple researchers apply codes differently, leading to unreliable results. Additionally, confirmation bias can occur if researchers force data into predefined themes rather than allowing themes to emerge naturally. To avoid these mistakes, use a well-structured coding framework, regularly review coding decisions, and consider AI-driven qualitative analysis tools for greater accuracy and consistency.
Yes, real-time qualitative data coding is possible using AI-driven thematic analysis software like Thematic, which leverages Natural Language Processing (NLP) to categorize feedback instantly. This is particularly useful for businesses handling large volumes of customer reviews, surveys, and support conversations. While manual coding is time-intensive, automated solutions enable scalable, consistent, and bias-free analysis in real time. However, human oversight is still crucial—researchers should regularly review AI-generated themes to refine and ensure accuracy. Combining automated and manual coding provides the best balance of speed and precision.
Inductive coding is ideal when exploring new datasets without predefined themes. It allows insights to emerge naturally, making it useful for open-ended surveys or customer feedback. Deductive coding, on the other hand, applies a predefined framework, making it effective for research with specific objectives. A hybrid approach is often best—start with inductive coding to uncover unexpected insights, then refine the analysis with deductive coding. Tools like Thematic use AI to identify themes inductively while allowing researchers to apply structured taxonomies for more targeted analysis, combining the strengths of both methods.
Join the newsletter to receive the latest updates in your inbox.